470: feat: Make Schema::denormalize public#50
Conversation
WalkthroughThe Schema type's denormalization functionality was refactored to improve the public API. A new public method ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🤖 Augment PR SummarySummary: This PR makes schema denormalization a public API so callers can inline externally-referenced named schemas and work with a fully self-contained schema. Changes:
Technical Notes: Denormalization mutates the schema in place, tracks already-defined names to avoid infinite recursion, and returns a resolution error if any referenced schema is missing (potentially leaving a partially denormalized value). 🤖 Was this summary useful? React with 👍 or 👎 |
Code ReviewThis PR makes IssuesTypo in doc comment ( Should be: "If you require..." Broken/incorrect rustdoc link in /// Will return a [`Details::SchemaResolutionError`] if it fails to find
Partial mutation on error is a public API footgun: Minor NotesMissing doc example: Other public methods in this file include a Minor performance regression: The original code in SummaryThe core approach is correct and the API design (hiding |
There was a problem hiding this comment.
Code Review
This pull request makes the Schema::denormalize function public, refactoring its implementation into a private denormalize_inner function for a cleaner API. However, a potential stack overflow vulnerability was identified in the recursive logic of denormalize_inner when handling cycles of schema references, which could lead to Denial of Service. It is recommended to implement cycle detection or a recursion depth limit. Additionally, a performance improvement is suggested for schema resolution within denormalize by using a HashMap for faster lookups.
| pub fn independent_canonical_form(&self, schemata: &[Schema]) -> Result<String, Error> { | ||
| let mut this = self.clone(); | ||
| this.denormalize(schemata, &mut HashSet::with_capacity(schemata.len()))?; | ||
| this.denormalize(schemata)?; |
There was a problem hiding this comment.
This call to denormalize is vulnerable to a stack overflow if schemata contains a cycle of Schema::Ref objects. While the Avro parser prevents such cycles in schemas it parses from JSON, this function is public and can be called with manually constructed Schema objects that form a cycle of references.
There was a problem hiding this comment.
value:good-but-wont-fix; category:bug; feedback: The Gemini AI reviewer is correct! But since this is a manually created Schema instance (in contrast to a parsed one) this is responsibility of the application developer to create a valid and usable Schema.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
avro/src/schema/mod.rs (1)
700-702: No direct unit test for the newpub fn denormalize
avro_rs_420_independent_canonical_formexercisesdenormalizeonly throughindependent_canonical_form. The new public method needs its own tests covering at minimum:
- Successful resolution of external
Refs (asserting noSchema::Refremain in the output).- Error path: an unresolvable
RefreturnsErr(Details::SchemaResolutionError(...)).- Mutual recursion (A ↔ B) terminates and produces the correct partially-inlined form.
Do you want me to generate a test suite for
Schema::denormalizecovering these scenarios?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@avro/src/schema/mod.rs` around lines 700 - 702, Add direct unit tests for the new public Schema::denormalize method (not only via independent_canonical_form). Create tests that (1) construct schemas with external Ref entries and assert denormalize returns Ok and that no Schema::Ref remain, (2) construct a schema with an unresolvable Ref and assert denormalize returns Err(Details::SchemaResolutionError(...)), and (3) create mutually recursive schemas (A ↔ B) and assert denormalize terminates and produces the expected partially-inlined structure; use denormalize (and denormalize_inner where appropriate) to exercise resolution paths and mirror how independent_canonical_form currently exercises it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@avro/src/schema/mod.rs`:
- Line 401: Fix the typo in the doc comment for Schema::denormalize: change "If
your require a self contained schema..." to "If you require a self contained
schema..." so the sentence reads correctly and references Schema::denormalize;
update the comment near the Schema::denormalize documentation in mod.rs
accordingly.
- Around line 693-702: The public method denormalize currently calls
denormalize_inner in-place which can leave self partially modified on Err;
change denormalize to operate on a clone and only replace self on success: clone
self (use a Vec::with_capacity(schemata.len()) where you build temporary
schemata-related structures if applicable), call denormalize_inner on the cloned
instance, and if it returns Ok, swap or assign the cloned, denormalized value
back into self; if Err, return the error and leave the original self untouched.
Ensure you reference and use the existing denormalize, denormalize_inner, and
Schema abstractions when implementing the clone-and-swap approach.
---
Nitpick comments:
In `@avro/src/schema/mod.rs`:
- Around line 700-702: Add direct unit tests for the new public
Schema::denormalize method (not only via independent_canonical_form). Create
tests that (1) construct schemas with external Ref entries and assert
denormalize returns Ok and that no Schema::Ref remain, (2) construct a schema
with an unresolvable Ref and assert denormalize returns
Err(Details::SchemaResolutionError(...)), and (3) create mutually recursive
schemas (A ↔ B) and assert denormalize terminates and produces the expected
partially-inlined structure; use denormalize (and denormalize_inner where
appropriate) to exercise resolution paths and mirror how
independent_canonical_form currently exercises it.
| /// Returns the [Parsing Canonical Form] of `self` that is self contained (not dependent on | ||
| /// any definitions in `schemata`) | ||
| /// | ||
| /// If your require a self contained schema including `default` and `doc` attributes, see [`denormalize`][Schema::denormalize]. |
There was a problem hiding this comment.
Typo in doc comment: "your require" → "you require"
📝 Fix
- /// If your require a self contained schema including `default` and `doc` attributes, see [`denormalize`][Schema::denormalize].
+ /// If you require a self contained schema including `default` and `doc` attributes, see [`denormalize`][Schema::denormalize].📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// If your require a self contained schema including `default` and `doc` attributes, see [`denormalize`][Schema::denormalize]. | |
| /// If you require a self contained schema including `default` and `doc` attributes, see [`denormalize`][Schema::denormalize]. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/schema/mod.rs` at line 401, Fix the typo in the doc comment for
Schema::denormalize: change "If your require a self contained schema..." to "If
you require a self contained schema..." so the sentence reads correctly and
references Schema::denormalize; update the comment near the Schema::denormalize
documentation in mod.rs accordingly.
There was a problem hiding this comment.
value:good-to-have; category:typo; feedback: The CodeRabbit AI reviewer is correct! There is a minor typo in the documentation that should be fixed before merging the pull request.
| /// Remove all external references from the schema. | ||
| /// | ||
| /// `schemata` must contain all externally referenced schemas. | ||
| /// | ||
| /// # Errors | ||
| /// Will return a [`Details::SchemaResolutionError`] if it fails to find | ||
| /// a referenced schema. This will put the schema in a partly denormalized state. | ||
| pub fn denormalize(&mut self, schemata: &[Schema]) -> AvroResult<()> { | ||
| self.denormalize_inner(schemata, &mut HashSet::new()) | ||
| } |
There was a problem hiding this comment.
New public API leaves self in a partially-denormalized state on error
The doc comment mentions this, but exposing it publicly makes it a real footgun: a caller who handles the Err and then continues using self will operate on a partially-inlined schema tree (some Refs replaced, some not). Because this is now part of the public API, atomicity is worth the extra clone:
🛡️ Proposed atomic implementation
pub fn denormalize(&mut self, schemata: &[Schema]) -> AvroResult<()> {
- self.denormalize_inner(schemata, &mut HashSet::new())
+ let mut candidate = self.clone();
+ candidate.denormalize_inner(schemata, &mut HashSet::with_capacity(schemata.len()))?;
+ *self = candidate;
+ Ok(())
}This also restores the with_capacity(schemata.len()) hint that was dropped when the method was split (minor allocation efficiency).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@avro/src/schema/mod.rs` around lines 693 - 702, The public method denormalize
currently calls denormalize_inner in-place which can leave self partially
modified on Err; change denormalize to operate on a clone and only replace self
on success: clone self (use a Vec::with_capacity(schemata.len()) where you build
temporary schemata-related structures if applicable), call denormalize_inner on
the cloned instance, and if it returns Ok, swap or assign the cloned,
denormalized value back into self; if Err, return the error and leave the
original self untouched. Ensure you reference and use the existing denormalize,
denormalize_inner, and Schema abstractions when implementing the clone-and-swap
approach.
There was a problem hiding this comment.
value:valid-but-wont-fix; category:bug; feedback: The CodeRabbit AI reviewer is correct that this will solve the issue with the partially denormalized schema. But a partially denormalised schema is still a valid schema. If the user does not want a partially denormalized schema then they can clone it before attempting to denormalize.
value:good-to-have; category:typo; feedback: The Claude AI reviewer is correct! There is a minor typo in the documentation that should be fixed before merging the pull request. |
value:annoying; category:bug; feedback: The Claude AI reviewer is not correct! The documentation generates just fine! The |
470: To review by AI